Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Leaderboard Update - 2024/12/29 (Checkpoint 0cea216) #845

Merged
merged 6 commits into from
Dec 31, 2024

Conversation

@HuanzhiMao HuanzhiMao added the BFCL-Website BFCL Leaderboard Website label Dec 19, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@HuanzhiMao HuanzhiMao changed the title [BFCL] Leaderboard Update [BFCL] Leaderboard Update - 2024/12/29 (Checkpoint 0cea216) Dec 29, 2024
@HuanzhiMao HuanzhiMao marked this pull request as ready for review December 29, 2024 16:23
@CharlieJCJ
Copy link
Collaborator

CharlieJCJ commented Dec 31, 2024

DIFF of 12/06 and 12/29

changes_heatmap

Live Accuracy Table

live_acc_table_heatmap

Non-Live AST Accuracy Table

non-live_ast_acc_table_heatmap

Multi-Turn Accuracy Table

multi_turn_acc_table_heatmap

@HuanzhiMao
Copy link
Collaborator Author

HuanzhiMao commented Dec 31, 2024

This leaderboard update will exclude the BitAgent/GoGoAgent as their endpoint seems to be erroring out. cc @RogueTensor

Update: Their endpoint is fixed, and we have updated their score.

@CharlieJCJ
Copy link
Collaborator

DIFF of 12/06 and 12/29

changes_heatmap

Live Accuracy Table

live_acc_table_heatmap

Non-Live AST Accuracy Table

non-live_ast_acc_table_heatmap

Non-Live Exec Accuracy Table

non-live_exec_acc_table_heatmap

Multi-Turn Accuracy Table

multi_turn_acc_table_heatmap

Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all numbers, LGTM

@HuanzhiMao HuanzhiMao merged commit 1045ee2 into ShishirPatil:gh-pages Dec 31, 2024
@HuanzhiMao HuanzhiMao deleted the leaderboard-update branch December 31, 2024 13:01
ShishirPatil pushed a commit that referenced this pull request Jan 5, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…867)

This PR uses the same checkpoint as PR #845 and serves as a follow-up to
that PR. Previously, we were unable to obtain model results for some
models (namely, the Gemini 2.0 Flash series) due to rate limit
constraints. Those results are now available, and their scores have been
added in this PR.

The scores for other models remain unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-Website BFCL Leaderboard Website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants